Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Integrating multimodal data such as RGB and LiDAR from multiple views significantly increases computational and communication demands, which can be challenging for resource-constrained autonomous agents while meeting the time-critical deadlines required for various mission-critical applications. To address this challenge, we propose CoOpTex, a collaborative task execution framework designed for cooperative perception in distributed autonomous systems (DAS). CoOpTex contribution is twofold: (a) CoOpTex fuses multiview RGB images to create a panoramic camera view for 2D object detection and utilizes 360° LiDAR for 3D object detection, improving accuracy with a lightweight Graph Neural Network (GNN) that integrates object coordinates from both perspectives, (b) To optimize task execution and meet the deadline, CoOpTex dynamically offloads computationally intensive image stitching tasks to auxiliary devices when available and adjusts frame capture rates for RGB frames based on device mobility and processing capabilities. We implement CoOpTex in real-time on static and mobile heterogeneous autonomous agents, which helps to significantly reduce deadline violations by 100% while improving frame rates for 2D detection by 2.2 times in stationary and 2 times in mobile conditions, demonstrating its effectiveness in enabling real-time cooperative perception.more » « lessFree, publicly-accessible full text available June 9, 2026
- 
            Free, publicly-accessible full text available March 17, 2026
- 
            In network-constrained environments, distributed multi-agent systems—such as UGVs and UAVs—must communicate effectively to support computationally demanding scene perception tasks like semantic and instance segmentation. These tasks are challenging because they require high accuracy even when using low-quality images, and the network limitations restrict the amount of data that can be transmitted between agents. To overcome the above challenges, we propose TAVIC-DAS to perform a task and channel-aware variable-rate image compression to enable distributed task execution and minimize communication latency by transmitting compressed images. TAVIC-DAS proposes a novel image compression and decompression framework (distributed across agents) that integrates channel parameters such as RSSI and data rate into a task-specific "semantic segmentation" DNN to generate masks representing the object of interest in the scene (ROI maps) by determining a high pixel density needed to represent objects of interest and low density to represents surrounding pixels within an image. Additionally, to accommodate agents with limited computational resources, TAVIC-DAS incorporates resource-aware model quantization. We evaluated TAVIC-DAS on platforms such as ROSMaster X3 and Jetson Xavier, which communicated using a low-frequency proprietary Doodle radio operating at 915 MHz. The experimental results show that TAVIC-DAS achieves approximately 7.62% higher PSNR and is about 6.39% more resource efficient compared to state-of-the-art techniques.more » « lessFree, publicly-accessible full text available March 17, 2026
- 
            Liang, Xuefeng (Ed.)Deep learning has achieved state-of-the-art video action recognition (VAR) performance by comprehending action-related features from raw video. However, these models often learn to jointly encode auxiliary view (viewpoints and sensor properties) information with primary action features, leading to performance degradation under novel views and security concerns by revealing sensor types and locations. Here, we systematically study these shortcomings of VAR models and develop a novel approach, VIVAR, to learn view-invariant spatiotemporal action features removing view information. In particular, we leverage contrastive learning to separate actions and jointly optimize adversarial loss that aligns view distributions to remove auxiliary view information in the deep embedding space using the unlabeled synchronous multiview (MV) video to learn view-invariant VAR system. We evaluate VIVAR using our in-house large-scale time synchronous MV video dataset containing 10 actions with three angular viewpoints and sensors in diverse environments. VIVAR successfully captures view-invariant action features, improves inter and intra-action clusters’ quality, and outperforms SoTA models consistently with 8% more accuracy. We additionally perform extensive studies with our datasets, model architectures, multiple contrastive learning, and view distribution alignments to provide VIVAR insights. We open-source our code and dataset to facilitate further research in view-invariant systems.more » « lessFree, publicly-accessible full text available March 10, 2026
- 
            Recent advancements in deep learning-based wearable human action recognition (wHAR) have improved the capture and classification of complex motions, but adoption remains limited due to the lack of expert annotations and domain discrepancies from user variations. Limited annotations hinder the model's ability to generalize to out-of-distribution samples. While data augmentation can improve generalizability, unsupervised augmentation techniques must be applied carefully to avoid introducing noise. Unsupervised domain adaptation (UDA) addresses domain discrepancies by aligning conditional distributions with labeled target samples, but vanilla pseudo-labeling can lead to error propagation. To address these challenges, we propose μDAR, a novel joint optimization architecture comprised of three functions: (i) consistency regularizer between augmented samples to improve model classification generalizability, (ii) temporal ensemble for robust pseudo-label generation and (iii) conditional distribution alignment to improve domain generalizability. The temporal ensemble works by aggregating predictions from past epochs to smooth out noisy pseudo-label predictions, which are then used in the conditional distribution alignment module to minimize kernel-based class-wise conditional maximum mean discrepancy (kCMMD) between the source and target feature space to learn a domain invariant embedding. The consistency-regularized augmentations ensure that multiple augmentations of the same sample share the same labels; this results in (a) strong generalization with limited source domain samples and (b) consistent pseudo-label generation in target samples. The novel integration of these three modules in μDAR results in a range of ~ 4-12% average macro-F1 score improvement over six state-of-the-art UDA methods in four benchmark wHAR datasets.more » « lessFree, publicly-accessible full text available December 9, 2025
- 
            Robust communication is vital for multi-agent robotic systems involving heterogeneous agents like Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) operating in dynamic and contested environments. These agents often communicate to collaboratively execute critical tasks for perception awareness and are faced with different communication challenges: (a) The disparity in velocity between these agents results in rapidly changing distances, in turn affecting the physical channel parameters such as Received Signal Strength Indicator (RSSI), data rate (applicable for certain networks) and most importantly "reliable data transfer", (b) As these devices work in outdoor and network-deprived environments, they tend to use proprietary network technologies with low frequencies to communicate long range, which tremendously reduces the available bandwidth. This poses a challenge when sending large amounts of data for time-critical applications. To mitigate the above challenges, we propose DACC-Comm, an adaptive flow control and compression sensing framework to dynamically adjust the receiver window size and selectively sample the image pixels based on various network parameters such as latency, data rate, RSSI, and physiological factors such as the variation in movement speed between devices. DACC-Comm employs state-of-the-art DNN (TABNET) to optimize the payload and reduce the retransmissions in the network, in turn maintaining low latency. The multi-head transformer-based prediction model takes the network parameters and physiological factors as input and outputs (a) an optimal receiver window size for TCP, determining how many bytes can be sent without the sender waiting for an acknowledgment (ACK) from the receiver, (b) a compression ratio to sample a subset of pixels from an image. We propose a novel sampling strategy to select the image pixels, which are then encoded using a feature extractor. To optimize the amount of data sent across the network, the extracted feature is further quantized to INT8 with the help of post-training quantization. We evaluate DACC-Comm on an experimental testbed comprising Jackal and ROSMaster2 UGV devices that communicate image features using a proprietary radio (Doodle) in 915-MHz frequency. We demonstrate that DACC-Comm improves the retransmission rate by ≈17% and reduces the overall latency by ≈12%. The novel compression sensing strategy reduces the overall payload by ≈56%.more » « lessFree, publicly-accessible full text available January 6, 2026
- 
            The ubiquitousness of smart and wearable devices with integrated acoustic sensors in modern human lives presents tremendous opportunities for recognizing human activities in our living spaces through ML-driven applications. However, their adoption is often hindered by the requirement of large amounts of labeled data during the model training phase. Integration of contextual metadata has the potential to alleviate this since the nature of these meta-data is often less dynamic (e.g. cleaning dishes, and cooking both can happen in the kitchen context) and can often be annotated in a less tedious manner (a sensor always placed in the kitchen). However, most models do not have good provisions for the integration of such meta-data information. Often, the additional metadata is leveraged in the form of multi-task learning with sub-optimal outcomes. On the other hand, reliably recognizing distinct in-home activities with similar acoustic patterns (e.g. chopping, hammering, knife sharpening) poses another set of challenges. To mitigate these challenges, we first show in our preliminary study that the room acoustics properties such as reverberation, room materials, and background noise leave a discernible fingerprint in the audio samples to recognize the room context and proposed AcouDL as a unified framework to exploit room context information to improve activity recognition performance. Our proposed self-supervision-based approach first learns the context features of the activities by leveraging a large amount of unlabeled data using a contrastive learning mechanism and then incorporates this feature induced with a novel attention mechanism into the activity classification pipeline to improve the activity recognition performance. Extensive evaluation of AcouDL on three datasets containing a wide range of activities shows that such an efficient feature fusion-mechanism enables the incorporation of metadata that helps to better recognition of the activities under challenging classification scenarios with 0.7-3.5% macro F1 score improvement over the baselines.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
